Reward Maximization Under Uncertainty: Leveraging Side-Observations Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

نویسندگان

Swapna Buccapatnam

Fang Liu

چکیده

We study the stochastic multi-armed bandit (MAB) problem in the presence of sideobservations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends’ activity, thus providing side information about each other’s preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

متن کامل

Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline

Abstract. Any intelligent system performing evidence-based decision making under time pressure must negotiate a speed-accuracy trade-off. In computer science and engineering, this is typically modeled as minimizing a Bayes-risk functional that is a linear combination of expected decision delay and expected terminal decision loss. In neuroscience and psychology, however, it is often modeled as m...

متن کامل

Expectation Maximization for Average Reward Decentralized POMDPs

Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (DecPOMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and longterm effects of actions are crucial, and discounting based s...

متن کامل

Optimal Temporal Risk Assessment

Time is an essential feature of most decisions, because the reward earned from decisions frequently depends on the temporal statistics of the environment (e.g., on whether decisions must be made under deadlines). Accordingly, evolution appears to have favored a mechanism that predicts intervals in the seconds to minutes range with high accuracy on average, but significant variability from trial...

متن کامل

Dynamics of betting behavior under flat reward condition

One of the missions of the cognitive process of animals, including humans, is to make reasonable judgments and decisions in the presence of uncertainty. The balance between exploration and exploitation investigated in the reinforcement-learning paradigm is one of the key factors in this process. Recently, following the pioneering work in behavioral economics, growing attention has been directed...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Reward Maximization Under Uncertainty: Leveraging Side-Observations Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

نویسندگان

چکیده

منابع مشابه

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks

Reward-Rate Maximization in Sequential Identification under a Stochastic Deadline

Expectation Maximization for Average Reward Decentralized POMDPs

Optimal Temporal Risk Assessment

Dynamics of betting behavior under flat reward condition

عنوان ژورنال:

اشتراک گذاری